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(54) System and method for audio-visual content verification 



(57) The invention provides a method for video con- 
tent verification, operative to compare and verify the 
content of a first audio-visual stream with the content of 
a second audio-visual stream, comprising the steps of 
extracting characteristic data from a first audio-vrsual 
stream, extracting characteristic data from a second au- 
dio-visual stream, and comparing the extracted charac- 
teristic data from the first and second audio-visual 
streams. The invention also provides a system for car- 
rying out the method. 
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Description 

Field of the invention 

The present invention relates to audio-visual test 
and measurement systems and more particularly to a 
method and apparatus for comparing a given content 
stream with a reference content stream for verifying the 
correctness of a given data stream and for detecting var- 
ious content-related problems, such as missing or dis- 
torted content, as well as badly synchronized content 
streams such as audio or sub-titles delayed with respect 
to the video stream. 

"Audio-visual content" is herein defined as a stream 
or sequence of video, audio, graphics (sub-pictures) 
and other data where the semantics of the data stream 
is of value. The term "stream" or "sequence' is of par- 
ticular importance, since it is assumed that the ordering 
of content elements along a time or space line consti- 
tutes part of the content. 

Background of the Invention 

Elementary content streams may be combined to a 
composite stream. Starting with a simple monophonic 
audio or video transmission, an application which in- 
volves two video streams (for stereoscopic display), six 
or eight surround audio channels and several sub-pic- 
ture channels can be formed. Generally, the relative 
alignment of these streams is highly significant and 
should be verified. 

in known systems, an analysis is made of video sig- 
nal for detecting disturbances of that signal, such as il- 
legal colors. An "illegal color" is one that is outside the 
practical limit set for a particular format. Other types of 
video measurement involve injecting known signals at 
the source and evaluating certain properties thereof at 
the receiving end. 

With the introduction of the serial digital interface 
(SDI) standard, now used as a carrier for video, audio 
and data, error detection schemes are designed for test- 
ing data integrity. Such a scheme has already been pro- 
posed. 

The known video test and measurement systems 
are, however, generally not capable of detecting con- 
tent-related problems, such as missing or surplus 
frames, program time shift, color or luminance distor- 
tions which are within the acceptable parameter range, 
mis-alignment of content streams such as audio or sub- 
pictures with respect to video, etc. 

In many facilities, an observer will look at the display 
to detect quality problems. An experienced operator 
may detect and interpret a variety of problems in record- 
ing and transmission. An observer can do good rule- 
based or subjective evaluation of video content, howev- 
er, human inspection of content is costly and unpredkrt- 
able. Additk>nally, some content-related defects cannot 
be detected by an observer. 



As state of the art content delivery technologies 
such as multi-channel Digital TV, Digital Video Disk and 
the Internet provkJe more content and interactivity con- 
tent-related problems are more likely to occur, since the 
5 path from the content sources to the end-user becomes 
more complicated. Additionally, the huge anrK>unts of 
content generated, edited, recorded and transmitted in 
multiple channels and multiple distribution slots (such 
as video-on-demand) make human inspection almost 

10 impossible. 

It is therefore a broad object of the invention to pro- 
vide a computerized method and system for comparing 
a given content stream with a reference content stream, 
for verifying that the given stream is in fact the correct 

IS one and to detect various content-related defects. 

In many cases, the reference stream consists of the 
original program material and the actual stream consists 
of the broadcast or played content. In other cases, the 
designation of one stream as the reference stream is 
arbitrary, for example, comparing one content stream 
with a backup stream. However, for convenience of de- 
scription hereinafter the terms "reference content 
stream" and "actual content stream" will be used, with- 
out limiting the generality of the invention. 

^5 For illustrative purposes only, the invention will be 
described by two applk^ations: broadcast automaton 
and digital versatile disc (DVD) pre-mastering. This de- 
scription however, is not intended to limit the generality 
of the invention or its applicability to other domains. 

30 Toda/s multi-channel, multi-program applteations 
cannot be controlled manually. Including commercials 
and program trailers, a daily schedule may consist of 
hundreds of video segments, intended to play seam- 
lessly. Such a schedule is usually implemented by an 

3S autonnation system. The schedule is logged into the sys- 
tem as some form of a table (a "play-list") describing the 
program's name, start time, duratbn and source, e.g., 
storage media, unique identifier, time-code of first 
frame. 

40 The storage media can be a tape or a digital file. 
Generally, the program source material is organized in 
an hierarchical manner, with most of the content stored 
off-line. The forthcoming programs are loaded on a tape 
machine and sometimes, as in the case of a commercial 

45 or trailer, digitized to a disk-based sender The complex 
paths of the various elements of content may further in- 
crease the content mismatch probability. 

An example of such an automation system is the 
ADC-100 from Louth Automation. ADC-100 can run up 

50 to 16 lists simultaneously, and control multiple devices 
including disk servers, video servers, tape machines, 
cart machines, VTRs, switchers, character generators 
and audio carts. The present invention can verify the 
identity and integrity of the broadcast content, providing 

^ important feedback for the automation system or facility 
manager 

DVD is a new generation of the compact disc format 
which provides increased storage capacity and perform- 
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ance. especially for video and multimedia applications. 
DVD tor video is capable of storing eight audio tracks 
and thirty-two "sub-picture" tracks, which are used for 
subtitles, menus, etc. These can be used to put several 
selectable languages on each disc. The interactive ca- s 
pabilities of consumer DVD players include menus with 
a small set of navigation and control commands, with 
some functions for dynamic video stream control, such 
as seamless branching, which can be used for playing 
different "cuts' of the same video material for dramatic io 
purposes, censorship, etc. DVD-ROM, which will be 
used for multi-media applicatk^ns, will exhibit a higher 
level of interactivity. 

Since DVD contains multiple content streams with 
many options for branching from one stream to the other is 
or combining several streams, such as a menu or sub- 
titles overlaid on a video frame, one has to verify that a 
given set of initial settings, followed by a specific set of 
navfgatbn commands, indeed produces the correct 
content. This step in DVD production is known as "em- 
ulation", currently designed to be performed by an ob- 
server. The present inventk>n also allows automation of 
DVD emulation. 

It is important to note that in DVD, the video image 
is composed of the motion picture stream overlaid by 2S 
sub-pictures or graphics, such as sub-titling. Although 
all video streams and all sub-picture bitmaps are avail- 
able before emulation takes place, the composite image 
depends on the actual user's choices and the user's 
■navigation" in the content tree. It is impractical to gen- 30 
erate all possible compositions prior to emulation and 
use these as the reference content. Therefore, descrip- 
tors of the actual content must be compared against ap- 
propriate descriptors of the component streams. 

In both broadcast or DVD applications, it may be 3S 
necessary to detect video compression artifacts. While 
some of these are due to the mathematical compression 
itself, others may arise during transmission/playback, 
due to buffer overflow and other reasons. A common 
image compression artifact Is "blockiness" or the visibil- 40 
ity of edges between image blocks. Detecting artifacts 
In a completely rule-based manner, such as looking for 
these edges, may be misleading since such edges may 
be present in the original, uncompressed image. An im- 
age-reference based approach in which the com- 4S 
pressed image is compared with the original image pro- 
vides a good tool for algorithm evaluation. However. In 
a practical situation, such an image will not be available 
at the receiving/playback end for real-time detection of 
compression artifacts. It is therefore necessary to com- so 
pare compressed material with the original material, 
based on concise content descriptors computed from 
both streams. 

It is an object of the present invention to provide a 
content verification system in which an audio-visual pro- ss 
gram broadcast or recorded on storage media can be 
compared with a reference program. 

The audio-visual program comprises at least one 



video channel, or at least one audio channel, or at least 
one sub-picture channel comprising sub-titles, closed- 
captions and any kind of auxiliary graphics information 
which is timed synchronously with the video or audk>. 
While in certain applications sub-pictures are embed- 
ded in the video image sequence, in other applications 
they are carried by a separate stream/file. 

Summary of the Invention 

The present invention therefore provides a method 
of comparing the content obtained by broadcast or play- 
back with a reference content, including the steps of ex- 
tracting frame characteristic data streams from said ref- 
erence content and from actual received or playback 
content, aligning said streams and comparing said 
streams on a frame-by-frame basis. 

U.S. Patent No. 5,339,166, entitled ■Motk>n-De- 
pendent Image Classification for Editing Purposes," de- 
scribes a system for comparing two or more versions, 
typically of different dubbing languages, of the same 
feature film. By identifying cannera shot boundaries in 
both versions and comparing sequences of shot length, 
a common video version, comprising camera shots 
which exist in ail versions, can be automatically gener^ 
ated. While the embodiment described in this patent al- 
tows, in principle, the location of content differences be- 
tween versions at camera shot level, frame-by-frame 
alignment for all frames in the respective version is not 
performed. Further, the differences detected are in the 
existence or absence of video frames as a whole. In con- 
trast, the present invention allows frame-by-frame in- 
spection of color properties, detection of compresston 
artifacts, audio distortions, etc. 

Furthermore, in the U.S. patent, the content of each 
frame is fixed and characteristic data are computed from 
the content. The present Invention, on the other hand, 
addresses the on-line composition of a content stream 
from basic content streams, such that characteristic da- 
ta are pre-computed only for these basic streams. Given 
the branching/navigation/editing commands, a compos- 
ite reference characteristic data stream is predk:ted 
from the component characteristic data stream and then 
compared with the actual content stream. 

Moreover, the present inventton does not depend 
on the specific format/representation of the content 
sources and streams. In the same application, one 
stream may be analog and the other digital. Additionally, 
one stream may be compressed and the other may be 
of full bandwidth. Typically, in a broadcast environment, 
the input will be CCIR-601 digital video and AES digital 
audto. Multiple audio streams may be due to different 
dubbing languages, as well as stereo and surround 
sound channels. 

Generally, the extraction of characteristic data will 
be done in real-time, thus saving intermediate storage 
and also enabling real-time error detection in a broad- 
casting environment. However, this is not a limitatton. 
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since the present invention can be used off-line by re- 
cording both the reference and the actual audio-visual 
program. When working off-line, processing can be 
slower than real-time or faster, depending on the conn- 
putational resources. When verifying dubs or copies of 
video cassettes, a faster than real-time performance 
may be needed, depending, of course, on the availability 
of a suitable analog to digital converter which can cope 
with fast-forward video signals. 

Brief Description of the Drawings 

The invention will now be described In connection 
with certain preferred embpdiments with reference to 
the following illustrative figures so that it may be more 
fully understood. 

With specific reference now to the figures in detail, 
it is stressed that the particulars shown are by way of 
example and for purposes of illustrative discussion of 
the preferred embodiments of the present invention on- 
ly, and are presented in the cause of providing what is 
believed to be the most useful and readily understood 
description of the principles and conceptual aspects of 
the invention. In this regard, no attempt is made to show 
structural details of the invention in more detail than is 
necessary for a fundamental understanding of the in- 
vention, the description taken with the drawings making 
apparent to those skilled in the art how the several forms 
of the invention may be embodied in practice. 

In the drawings: 

Fig. 1 is a block diagram of a top level flow of 
processing of an audio-visual content verification 
system; 

Fig. 2 Is a block diagram of a circuit for storing de- 
tected content problems; 

Fig. 3 schematically illustrates an array of video se- 
quence characteristic data; 

Fig. 4 schematically illustrates an array of video 
frame or still image spatial characteristic data; 
Fig. 5 schematically illustrates a set of regions in a 
video frame; 

Fig. 6 schematically illustrates relative location of 
graphics sub-pictures with respect to the video 
frame; 

Fig. 7 is a block diagram illustrating extraction of 
sub-title characteristic data; 

Fig. 8 is a block diagram illustrating sub-title image 

sequence processing; 

Fig. 9 schematically depicts a record of sub-pk:tures 
characteristic data; 

Fig. 10 is a block diagram Illustrating derivation of 
audio characteristic data; 

Fig. 11 is a block diagram of a circuit for the selec- 
tion of anchor frames for coarse alignment; 
Fig. 1 2 is a block diagram of a circuit for alignment 
of a composite stream with the component refer- 
ence streams: 



Fig. 1 3 is a bkx:k diagram of a circuit for frame ver- 
ification processing; and 

Fig. 14 is a block diagram of a characteristic data 
design workstation. 

5 

Detailed Description of Preferred Ennbodiments 

With reference now to the drawings, Fig. 1 shows a 
top level flow of processing of an audio-visual content 
10 verification system according to the present invention. 
Reference sub-picture stream 11, video stream 12 and 
audio stream 1 3 are stored in their respective stores 14, 
15 and 16, to be eventually processed by processors 
17, 18 and 19, respectively. The combination of sub-pic- 
»5 tures with video, as well as transitk>n/branching be- 
tween program segments, is applied at characteristic 
data level by predictor 20, driven by navigation/playback 
commands 21. 

Actual video stream 22 and audio stream 23 are 
2o stored in their respective stores 24 and 25. to be later 
processed by processors 26 and 27 respectively. The 
vkjeo stream 22 and the corresponding characteristk: 
data are composed of video and sub-pictures. 

Once in the characteristic data stores 28 and 29, 
25 the data streams are input to the characteristk; data 
alignment processor 30, resulting in frame-aligned char- 
acteristic data. The alignment process also results in a 
program time-shift value, as well as indices or time- 
codes of missing or surplus frames. Once the data are 
30 frame-aligned, characteristk; data are compared on a 
frame-by-frame basis in comparator 32. yielding a frame 
quality report. 

Fig. 2 shows means for storing detected content 
problems. Recently played/received video from store 24 
35 undergoes compression in engine 34 and Is then stored 
in buffer 35. The recently played/received audio from 
store 25 is directly stored in buffer 36.. Transfer control- 
ler 37 is activated by verification reports 38 to transfer 
the content into hard disk storage 39, where it can be 
later analyzed. 

Fig. 3 shows an array of video sequence character- 
istk; data 40. The list comprises Image difference meas- 
ures, as well as image motion vectors. These measures 
may include properties of the histogram of the difference 
image, obtained by subtracting two adjacent images, as 
is known perse. In particular, the "span" characteristic 
data, defined as the difference in gray levels between a 
high (e.g., 85) percentile and a bw (e.g., 15) percentile 
of said histogram, was found to be useful. Alternatively, 
50 a measure of difference of Intensity histogram of two ad- 
jacent images, also by a known technique, may be used. 

Motk>n vector fields are computed at pre-deter- 
mined locations while using a block-matching motion 
estimation algorithm. Alternatively, a more concise rep- 
5S resentatk>n may consist of camera motkxi parameters, 
preferably estimated from image motion vector fields. 

Fig. 4 shows an array of video frame or still image 
spatial characteristic data. The list comprises color char- 
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acteristic data 41 . texture characteristic data 42 and sta- 
tistics derived from image regions. Such statistics may 
include the mean, the variance and the median of lumi- 
nance values. Useful color characteristic data Include 
the first three moments: average, variance and skew- 
ness of color components: 



1 



where is the value of the l-th color space corrponent 
of the j-th image pixel. Color spaces of convenience may 
Include the (R.G.B) representation or the (Y.U.V). which 
provide luminance characteristic data through the Y 
component. 

Texture provides measures to describe the structur- 
al composition, as well as the distribution, of Image gray- 
levels. Useful texture characteristic data are derived 
from spatial gray-level dependence matrices. These in- 
clude measures such as energy, entropy and correla- 
tion. 

The selection of characteristic data fore specific ap- 
plication of content verification is important. Texture and 
color data are Important for matching still images. Video 
frame sequences with significant motion can be aligned 
by motion characteristic data. For more static sequenc- 
es, color and texture data can facilitate the alignment 
process. 

When compuling color and texture characteristic 
data, the region of support, that is. the image region on 
which these data are computed, is significant. Using the 
entire image, or most of it, is preferred when robustness 
and reduced storage are required. On the other hand, 
deriving multiple characteristics at numerous, relatively 
small image regions has two important advantages: 

1 ) better spatial discrimination power (like a low res- 
olution image); and 

2) when overlaid by sub-picture (graphics), those 
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regions which do not intersect with graphics data 
still can be matched with corresponding character- 
Istk: data of the original video frame. 

Fig. 5 shows a set ot regions 42 in a video frame 
43. such that color or texture characteristic data are 
connputedfor each such region. Fig. 6 illustrates the rel- 
ative location of graphics sub-pictures with respect to 
the video frame. Number 44 represents a sub-title sub- 
picture and number 45 represents a menu-rtem sub-pic- 
ture. 

Figs. 7 and 8 show the extraction of sub-title char- 
acteristic data. Sub-titles or closed captions in a movie 
are used to bring translated dialogues to the viewer 
Generally, a sub-title will occupy several dozen frames. 
A suitable form for sub-title characteristic data Is time- 
code-in, time-code-out of that specific sub-title, with ad- 
ditional data describing the sub-title bitmap. The sub- 
title Image sequence processor 46 analyses every video 
frame of the sequence to detect specific frames at which 
sub-title information is changed. The result Is a se- 
quence of sub-title bitmaps, with the frame interval each 
such bitmap occupies in a tIme-code-in, time-code-out 
representation. Character istk: data are then extracted 
by unit 47 from the sub-title bitmap. 

Fig. 8 shows the sub-title image sequence proces- 
sor 46. The video image passes through a character bi- 
narization processor 48, operative to identify pixels be- 
longing to sub-title characters and paint them white, for 
example, where the background pixels are painted 
black. At every frame, the current frame bitmap 49 is 
compared: or matched, with the stored sub-title bitmap 
from the first Instance of that bitmap. At the first mis- 
match event, the sub-title bitmap is reported with the 
corresponding time-code interval, and a new matching 
cycle begins. 

The matching process can be Implemented by a 
number of binary template-matching or correlation algo- 
rithms. The spatial search range of the template-match- 
ing should accomnrKxjate mis-registration of a sub-title 
and additionally the case of scrolling sub-titles. 

The characteristic data of a single sub-title should 
be concise and allow for efficient matching. The sub-title 
bitmap, usually run-length coded, is a suitable represen- 
tation. Alternatively, one couki use shape features of In- 
dividual characters and a sub-title text string, using OCR 
software. 

In addition to text, sub-pictures consist of graphics 
elements such as bullets, highlight or shadow rectan- 
gles, etc. Useful characteristic data are obtained by us- 
ing circle and rectangle detectors. Fig. 9 shows a record 
50 of sub-pictures characteristic data. 

Fig. 10 shows the derivation of audio characteristic 
data. In analog form, the signal is digitized by the ar- 
rangement comprising an analog anti-aliasing filter 51 
and an A/D converter 52 and then filtered by the pre- 
emphasis filter 53. Spectral analysis uses a digital filter 
bank 54. 54'* ...54". The filter output Is squared and in- 
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tegrated by the power estimation unit 55, 55^ ...55". The 
set of characteristic data is computed for each video 
frame duration (40 msec for PAL, or 33.3 msec for NT- 
SC) and stored in store 56. Window duration controls 
the amount of averaging or smoothing used in power 
computation. Typically, a 60 or 50 msec window, for an 
overlap of 33%, can be used. 

The fitter bank is a series of linear phase FIR filters, 
so that the group delay for all filters Is zero and the output 
signals from the filters are synchronized in time. Each 
filter Is specified by its center frequency and its band- 
width. 

In many instances, the reference characteristic data 
stream Is not available explicitly, but has to be derived 
from said source characteristic data and from playback 
commands such as denoted In Fig. 1 . A simple case is 
when a program consists of consecutive multiple con- 
tent segments. Each such segment is specified by a 
source content Identifier , a beginning time-code and an 
ending time-code. Said reference characteristic data 
stream can be constructed or predicted from the corre- 
sponding segments of source characteristic data by 
means of concatenation. If content verrfication involves 
computing the actual content segment insertion points, 
these source characteristic data segments will be pad- 
ded by characteristic data margins to allow for inaccu- 
racies in insertion. 

Sometimes the transitions involve not only cuts, but 
also dissolves or fades. When the composite image is 
a linear combination of two source images, some char- 
acteristic data can be predicted based on the original 
source data as well as the blending values. These data 
include, for example, color moments computed over 
some regbn of support. In alignment and verification, 
the predicted values are compared against the actual 
values. 

An important step in the verification process is the 
frame-by-frame alignment of the characteristic data 
streams. The choice of the subset of characteristrc data 
used for alignment is important to the success of that 
step. Specifically, frame difference measures, such as 
the span described above, are well suited to alignment. 
A coarse-fine strategy is employed, in which anchor 
frames are used to solve the major time-shift between 
the content streams. Once that shift is known, fine 
frame-by-frame alignment takes place. 

An anchor frame is one with an unique structure of 
characteristic data in Its neighborhood. Fig. 11 shows 
the selection of anchor frames for coarse alignment. 
Given the frame difference data, for example, the span 
sequence, local variance estimation is effected In esti- 
mator 57 by means of a sliding window. Processors 58 
and 59 produce a list of local variance maxima which 
are above a suitable threshold. A consecutive process- 
ing step in processor 60 estimates the auto-correlation 
of the candidate anchor frame with its frame difference 
data neighborhood. 

In the step of reference anchor frame selection, a 



further criterion may be used to increase the effective- 
ness of the alignment step. The anchor frames are grad- 
ed by uniqueness, i.e., dissimilarity with other anchor 
frames, to reduce the probability of false matches in the 

5 next alignment step. Uniqueness Is computed by means 
of cross-correlation between the anchor frame and other 
anchor frames. By associating the number of anchor 
frames with a cross-correlation value lower than a spec- 
ified threshold with the specific anchor frame, those 

10 frames with highest uniqueness are selected. 

Uniqueness pruning is applied only to the reference 
anchor frames. 

Given the anchor frames of reference and actual 
stream, coarse alignment now begins. Each reference 

^5 and actual anchor frames pair such that the cross-cor- 
relation between their respective neighborhoods is 
above threshold and yields a plausible alignment offset, 
expressed in frame count. All pairs are tested and the 
offsets are stored in an offset histogram array. False 

20 matches passing the cross-correlation tests will be man- 
ifested as random offset values or noise in the histo- 
gram. A nominal case of time-shifted actual content, 
with few or no dropped frames, will yield a single peak 
in the histogram. In the case of a larger number of miss- 

25 ing or surplus frames, such as a few missing frames at 
each transition, the voting process described above will 
produce several peaks, each corresponding to a signif- 
icant shift. 

Having solved the time-shift between correspond- 
30 ing stream characteristic data intervals which are 
bounded by matched anchor frames, the respective in- 
tervals have to be matched. The nnatching process can 
be described as a sequence of edit operators whrch 
transform the first interval of frame characteristic data 
3S to the second interval. The sequence consists of three 
such operators: 

1 ) deletion of a frame from a first stream; 

2) insertion of a frame to a first stream; and 

40 3) replacement of a franne from a first stream with a 
frame from a second stream. 

Having associated a cost with each of these oper- 
ations, the fine frame alignment problem has now been 
transformed to finding a minimum cost sequence of op- 
erators which implements the transformation. If m is the 
length of the first interval and n is the length of the sec- 
ond interval In frames, then the matching problem can 
be solved in space and time proportional to (fn*n). All 

50 that remains is to set the respective costs. Deletion and 
insertkxi can be assigned a fixed cost each, based on 
a-p/TO/7 information on the probability of dropped or sur- 
plus frames. Replacement is a distance measure on the 
characteristic data vector, such as weighted Euclidean 

55 distance. 

Fig. 12 shows the alignment of a composite stream 
with the component reference streams by means of a 
processor 61 and geometric filter 62. In a simple case. 
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sub-tltie graphics of the language of choice are com- 
bined with the video frame sequence. The location of 
sub-titles in the videof rame can be specified either man- 
ually. In the characteristic data design workstation as de- 
scribed below, or can be automatically computed, based 
on analysis of the sub-trtle sub-picture stream. For that 
simple case, video frame verification is done in the im- 
age region free from sub-titles. Additionally, sub-title pic- 
ture verification is done in the sub-title image region. 

A more difficult case Is when graphics are overlaid 
on the video frame, such as in the case of displaying a 
menu in a DVD player. The location of menu bullets and 
text may be. for example, as Illustrated in Fig. 6. For that 
specific case, rt is assumed that the graphics stream has 
been pre-processed to extract the graphics regions of 
support, in the form of bounding rectangles for text lines 
and graphics primitives. These regions are stored as 
auxiliary characteristic data. By comparing graphics 
stream characteristic data with composite video frame 
stream graphics characteristic data in the respective 
graphics regions, the streams can be aligned. Once 
aligned, the composite frame graphics regions are 
known to be those of the corresponding graphics 
stream. Then, based on these regions, only color and 
texture actual frame characteristic data which are not 
occluded by overlay graphics [see Fig. 6] are compared 
with the respective reference data. 

Fig. 1 3 depicts the frame verification processes per- 
formed by the frame characteristic data comparator 32 
(Fig. 1). which start from aligned characteristic data 
streams. It is important to note that the characteristic 
data alignment processor 30 detects a variety of content 
problenDs. Failure in alignment may be due to the fact 
that a wrong content stream is playing, or the content 
stream is severely time-shifted, or the stream Is distort- 
ed beyond recognition. A successful alignment yields 
the indices of missing or surplus frames. Once aligned, 
each actual content frame is compared with the corre- 
sponding reference frame, based on the characteristic 
data. 

Then for the remaining data, frame-by-frame com- 
parison can take place In processors 63, 64 and 65 and 
comparators 66 and 67. The distance between charac- 
teristic data of corresponding frames detects quality 
problems such as luminance or color change, as well as 
audio distortions. By comparing graphics characteristic 
data, errors in sub-picture content and overlay may be 
detected. Also, by comparing characteristic data sensi- 
tive to compression artifacts, such artifacts can be de- 
tected. 

The comparison process requires the notions of dis- 
tance and threshold. For vector characteristb data such 
as color, luminance and audio, a vector distance meas- 
ure is used, such as the Mahak>nobis distance: 

0=(y-X*)V\Y->^) 



where A» are the reference and actual characteristic 
data vectors. C is the co-variance matrix which nriodels 
pairwise relationships among the individual character- 
istic data. The proper threshold may be computed at a 
5 training phase, using the characteristic data design 
workstation described hereinafter with reference to Fig. 
14. 

Comparator 68 compares blockiness characteristic 
data derived from the reference and actual video 

^0 frames, respectively. Such data may include power es- 
timates of a filter designed to enhance an edge grid 
structure, such as. for example, the grid spacing equals 
the compression block size, which is usually 8 or 16. By 
comparing these estimates with the reference value, an 

^5 increase in blockiness nnay be detected. As described 
above, absolute blockiness may be misleading, since it 
may originate from the original frame texture. 

Comparison of sub-pictures can be done at bitmap 
level, at the exclusive OR of the corresponding bitmaps. 

20 by computing the distance between corresponding 
shape characteristic data vectors, or by comparing rec- 
ognized sub-trtle text strings, where applicable. 

The term frame-by-frame,' which is used in con- 
junction with the comparison process, relates to the fact 

25 that once the content streams are aligned, inspectbn of 
every frame with the corresponding frame can be done. 
Clearly, comparison may Include all frames or a sub-set 
of the frames. 

The efficiency, robustness and content verification 

30 could be enhanced by using features that have greater 
discriminating power over the full reference content. By 
designing a software-configurable characteristic data 
set, the actual data of the full set whk:h is implemented 
will be enabled. 

^ Fig. 1 4 shows a characteristk: data design worksta- 
tion 69. The characteristic data acquisition part of the 
work-station replicates the referertce content process- 
ing front-end of Fig. 1. In addition, workstation 69 has 
access, by network 70, to the actual content data and 

40 riot just to the characteristk: data, for display at 71 and 
further analysis at 72. 

The development of the specific content verification 
application is conducted using an arrangement of a 
combination of manual, semi-automatic and automatic 

45 processes. For example, the user may specify the sub- 
titling type-face and its location in the video frame. Ad- 
ditionally, the user may select several representative 
content segments and the system then extracts a full 
characteristic data set, possibly in multiple passes or 

50 slower than real-time, ranking their discriminating power 
over the sample reference content and retaining their 
best features. 

It will be evident to those skilled in the art that the 
invention is not limited to the details of the foregoing il- 

55 lustrated embodiments, and that the present Invention 
may be embodied in other specific forms without depart- 
ing from the spirit or essential attributes thereof. The 
present embodiments are, therefore, to be conskJered 
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in ait respects as lliustrative and not restrictive, the 
scope of the invention being indicated by the appended 
claims rather than by the foregoing description, and ail 
changes which come within the meaning and range of 
equivalency of the clainDs are, therefore. Intended to be 
embraced therein. 

The method of the invention may further comprise 
the step of computing actual characteristic data from at 
least part of the actual broadcast or playback content 
streams. It may also comprise the step of computing ref- 
erence characteristic data from at least part of said ref- 
erence content streams. 

Said reference characteristic data may be derived 
from video frame sequences, still Images, audio and 
graphics, and said actual characteristic data may be de- 
rived from a video sequence and an audio channel. Al- 
so, said video image sequence characteristic data may 
include an image motion vector field, or data derived 
from an image difference signal, and said video frame 
or still image characteristic data may include luminance 
statistics in predefined regions of said frame or image. 

Preferably, said video frame or still image charac- 
teristic data also include texture characteristic data and/ 
or colour data, said colour characteristic data include 
colour moments, said video frame or still image charac- 
teristic data also include a low resolution or highly com- 
pressed version of the original image, said audio char- 
acteristic data include audio signal parameters, estimat- 
ed at a window size which is comparable with video 
frame duration, said graphics characteristic data exhibit 
printed text, and said graphics characteristic data also 
exhibit common graphics elements, including bullets 
and highlighted rectangles. 

In the method of the invention, said step of predict- 
ing may include generating a characteristic data stream 
from source streams and navigation commands or play- 
lists, branching from one source stream to another 
source stream. Said step of predicting may also include 
generating a characteristb data stream from source 
streams and transition commands such as cut, dissolve, 
fade to/from black, or said step may Include computing 
characteristic data of graphics sub-pictures overlay on 
a video image sequence or still. 

The evaluation of the information content of a cer- 
tain frame may be based on the temporal variation of 
characteristic data In said frame and in rts adjacent 
frames. 

The method may further comprise grading the infor- 
mation content of ail frames in a sequence, denoting 
frames with locally maximal informatk>n content as an- 
chor frames. 

The method may still further comprise evaluating 
the similarity between two anchor points, based on a 
measure of temporal correlation between the respective 
sets of neighbouring characteristic data. Alternatively, 
the method may further comprise evaluating the similar- 
ity between all pairs of anchor frames, such that, for 
each pair, one frame is from the ref erertce data and the 
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other is from the actual data. 

The method may further comprise reporting said 
alignment results, including the time shift between the 
designed and actual content broadcast-playback, as 
well as an indication of missing or surplus frames. The 
step of comparing may comprise first aligning the graph- 
rcs of sakJ composite frame sequence with said refer- 
ence graphics streams, and the step of aligning may fa- 
cilitate computing the location of all overlaid graphics In 
said composite frame sequence. The step of computing 
nnay facllftate filtering out cotour and texture actual 
frame characteristk: data which are occluded by sakJ 
overlay graphics. 

The method may further comprise comparing char- 
acteristic data of aligned frames to indicate quality or 
content problems, and said problems nnay be selected 
from the group comprising luminance or colour shifts, 
compression artifacts, audio artifacts, and audio or sub- 
pictures mismatch or mis-alignment. 



Claims 

1 - A method for video content verification, operative to 
compare and verify the content of a first audio-vis- 
ual stream with the content of a second audk>-visual 
stream, the method comprising the steps of: 

extracting characteristic data from a first audio- 
visual stream; 

extracting characteristic data from a second au- 
dk>-visual stream; and 

comparing the extracted characteristic data 
from said first and second audk>-visual 
streams. 

2. A method as claimed in claim 1 , wherein the step 
of comparison comprises: 

aligning said first and second audk>-vlsual 
streams on a frame-by-frame basis; and 
performing a frame-by-frame comparison of 
saki aligned streams of frames. 

45 3. A method as claimed in claim l or claim 2. wherein 
said first and second streams are selected from the 
group comprising the elementary content streams, 
including video image sequence. audk> channel, 
and sub-picture streams. 

4. A method as claimed in any one of claims 1 to 3, 
wherein said comparison of first and second 
streams yields at least one parameter, including 
time-shift between the desired and the actual timing 
of sakt second stream; list off missing frames in said 
second stream; list of surplus frames in said second 
stream; sub-title content error; graphics content er- 
ror, colour distortion, and luminance shift. 
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5. A method for video content verification, operative to 
compare and verify the content of a first audio-vis- 
ual stream with the content of a second audio-visual 
stream, wherein said second audio-visual content 
stream is defined by at least one source content 
stream and a set of editing instructions, the method 
comprising the steps of: 

extracting characteristic data from said first au- 
dio-visual stream; 

extracting characteristic data from said source 
content stream, and 

computing characteristic data of said second 
content-stream, based on characteristic data of 
said source content stream and on said editing 
instructions. 

6. A method as claimed in claim 5, wherein said in- 
structions are in the form of an Edit Decision List or 
Digital Video Disk branching instructions. 

7. A method as clainned in any one of claims 1 to 6, 
wherein said first or second stream is a reference 
content stream. 

8. A method as claimed in any one of claims 1 to 6, 
wherein said first and/or second streams are actual 
broadcast or playback content streams. 

9. A method as claimed in claim 7, further comprising 
the step of predicting the reference characteristic 
data stream from said reference characterlstb data 
and from playback instructions. 

10. A method as claimed in any one of claims 1 to 9, 
wherein said characteristic data extraction is op- 
tionally augmented by user input facilitating the ex- 
traction/relative weighting of said data. 

11. A method as claimed in claim 7, further comprising 
aligning the reference characteristic data stream 
with the actual characteristic data stream, on a 
frame-by-frame basis, and evaluating the informa- 
tion content of a certain frame. 

12. A method as claimed in claim 1 1 . further comprising 
computing the frame-index offset between the ref- 
erence and actual frames, based on the most likely 
offsets derived from evaluatbn of the similarity be- 
tween all anchor frames. 

1 3. A method as claimed in claim 1 1 , further comprising 
matching the reference frame sequence with the 
actual frame sequence, based on an identified 
frame-index offset, and further comprising the step 
of designating an actual frame as a surplus frame, 
or assigning to It a unique reference frame. 



14. A method as claimed in any one of claims 1 to 13, 
further comparing a composite video frame se- 
quence including graphics overlaid on a video 
frame sequence, with component reference 

s streams consisting of the original video frame se- 
quence as well as the graphics streams. 

15. A system for audio-visual content verification, oper- 
ative to compare and verify the content of a first au- 

10 dio-visual data stream with the content of a second 
audio-visual data stream, the system comprising: 



means for extracting characteristic data from a 
first audio-visual data stream; 
means for extracting characteristic data from a 
second audio-visual data stream; and 
means tor comparing characteristic data of sakl 
first and second audio-visual data streams. 
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16. A system as claimed in claim 1 5, wherein sakJ com- 
parison means comprises: 

means for aligning said audio-visual data 
streams on a frame-by-frame basis; and 
means for frame-by-frame comparison of said 
aligned data streams. 



17. A system as claimed in claim 1 5 or claim 1 6, where- 
in said first and second data streams are selected 
30 from the group comprising video image sequence. 
audk> channel, and sub-picture data streams. 



18. A system as claimed in any one of claims 15 to 1 7, 
wherein said means for comparison of said refer- 
ence data streams yields at least one of the param- 
eters including time-shift between the desired and 
the actual timing of said second data stream; list of 
missing frames In said second data stream; list of 
surplus frames in said second data stream; sub-title 
content error; graphics content error; cotour distor- 
tion, and luminance shift 
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19. A system for audio-visual content verification, oper- 
ative to compare and verify the content of a first au- 
dio-visual data stream with the content of a second 
audio-visual data stream, wherein said second au- 
dio-visual data stream is defined by at least one 
source content data stream and a set of editing in- 
structions, the system comprising: 

means for extracting characteristic data from 
said first audio-visual data stream; 
means for extracting characteristic data from 
said source content data stream; and 
means for computing characteristic data of said 
second content data stream, based 

on characteristic data of said source content data 
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stream and said editing instructions. 

20. A system as claimed in claim 19, wherein said ed- 
iting Instructions are In the form of an Edit Decision 
List or Digital Video Disk branching instructions. s 
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